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Abstract 

Mitochondria are intracellular organelles where oxidative phosphorylation is carried out to complete ATP synthesis. Mitochondria 
have their own genome; in metazoans, this is a small, circular molecule encoding 13 electron transport proteins, 22 tRNAs, and 2 
rRNAs. In invertebrates, mitochondrial gene rearrangement is common, and it is correlated with increased substitution rates. In 
vertebrates, mitochondrial gene rearrangement is rare, and its relationship to substitution rate remains unexplored. Mitochondrial 
genes can also show spatial variation in substitution rates around the genome due to the mechanism of mtDNA replication, which 
produces a mutation gradient. To date, however, the strength of the mutation gradient and whether movement along the gradient 
in rearranged (or otherwise modified) genomes impacts genie substitution rates remain unexplored in the majority of vertebrates. 
Salamanders include both normal mitochondrial genomes and independently derived rearrangements and expansions, providing a 
rare opportunity to test the effects of large-scale changes to genome architecture on vertebrate mitochondrial gene sequence 
evolution. We show that: 1) rearranged/expanded genomes have higher substitution rates; 2) most genes in rearranged/expanded 
genomes maintain their position along the mutation gradient, substitution rates of the genes that do move are unaffected by their 
new position, and the gradient in salamanders is weak; and 3) genomic rearrangements/expansions occur independent of levels of 
selective constraint on genes. Together, our results demonstrate that large-scale changes to genome architecture impact mitochon- 
drial gene evolution in predictable ways; however, despite these impacts, the same functional constraints act on mitochondrial 
protein-coding genes in both modified and normal genomes. 
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Introduction 

Mitochondria are intracellular organelles where oxidative 
phosphorylation (OXPHOS) is carried out to complete the 
process of ATP synthesis via cellular respiration. These organ- 
elles have their own genome (retained from their alpha- 
proteobacterial ancestor), which over evolutionary time has 
experienced extensive gene transfer to the nucleus 
(Andersson and Kurland 1998; Gray et al. 2001; Adams and 
Palmer 2003). In metazoans, this streamlining process has 
resulted in a small, circular DNA molecule encoding 13 pep- 
tides essential for electron transport, 22 tRNAs, and 2 rRNAs 
(Scheffler 2008). In addition, the mitochondrial genome 
also contains a control region (CR), a noncoding region 
that contains a replication origin and transcriptional promoter 
(Boore 1999; Saccone et al. 1999; Gissi et al. 2008; Scheffler 



2008). Because of its small size, the mitochondrial 
genome has served as an important tractable model for 
examining the relationships among the different forces shap- 
ing the evolution of genes and genomes. Such forces include 
point mutations, larger mutations impacting genome size 
and architecture, selection on protein function, selection on 
transcriptional and translational efficiency, and genetic drift 
(Moritz et al. 1987; Rand 1993; Ballard and Dean 2001; 
Fernandez-Silva et al. 2003; Rand et al. 2004; Lynch et al. 
2006; Detmer and Chan 2007; Scheffler 2008; Galtier et al. 
2009; Boussau et al. 2011). Despite much progress, many 
important questions remain unanswered about how these 
forces interact to drive molecular evolution in mitochondrial 
genomes. 

Although they have evolutionarily conserved genomic 
content, metazoan mitochondrial genomes show a diversity 
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of gene orders (Boore and Brown 1998; Boore 1999; Saccone 
et al. 1999; Xu et al. 2006). Rearrangement of gene order, 
often accompanied by expansion of noncoding regions, re- 
sults from gene duplications followed by random loss of one 
of the paralogs (Moritz and Brown 1 987; Boore 1 999; Mueller 
and Boore 2005; San Mauro et al. 2006) or by intramolecular 
recombination (Stanton et al. 1994). Variation in gene order 
is substantial across invertebrates, but is far less common in 
vertebrates (Boore 1999; Scheffler 2008). The reasons why 
some clades have more dynamic genomes than others 
remain unknown. The high rates of gene rearrangement 
in invertebrate mitochondrial genomes have been positively 
correlated with rates of nucleotide substitution within mito- 
chondrial genes in mollusks, insects, crustaceans, and other 
arthropods (Hoffmann et al. 1992; Hoeh et al. 1996; Shao 
et al. 2003; Xu et al. 2006). The relationship between gene 
rearrangement and substitution rate is unexplored in verte- 
brates, and the mechanisms underlying this correlation 
remain unknown across all taxa. 



In addition to variation in substitution rates across taxa as- 
sociated with gene rearrangements, mitochondrial genes in 
some taxa also show spatial variation in substitution rates 
around the genome due to the mechanism underlying mito- 
chondrial DNA (mtDNA) replication. Vertebrate mitochondrial 
replication initiates from two replication origins that are offset 
from each other, causing asynchronous replication across 
both strands (fig. 1). Specifically, mtDNA replication begins 
at the H-strand replication origin (0 H ), displacing one strand 
and leaving it single-stranded, until the newly synthesized 
strand reaches the L-stand replication origin (0 L ) and synthesis 
begins in the reverse direction along the displaced strand. 
Because single-stranded DNA is prone to mutation, mtDNA 
replication results in a gradient of mutation accumulation 
where regions that persist in the single-stranded state for 
longer experience more AT-biased mutations (Reyes et al. 
1998; Faith and Pollock 2003; Krishnan et al. 2004; 
Broughton and Reneau 2006; Scheffler 2008). The mitochon- 
drial environment is mutagenic, in part, because of the 
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Fig. 1. — Schematic representation of vertebrate mitochondrial genome replication. Replication begins at the 0 H and proceeds along the heavy strand. 
Once the replication fork passes the 0 L: replication begins along the light strand. 
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presence of reactive oxygen species (ROS), byproducts of 
oxidative phosphorylation (Scheffler 2008). Metabolic rates 
vary dramatically across vertebrates, which may, in turn, 
yield different levels of mutagenicity, impacting the strength 
of the mutation gradient (Gatten et al. 1992; Santos 2012). 
To date, however, 1) the strength of the mutation gradient, 
2) whether the mutation gradient produces a substitution 
rate gradient, and 3) whether genes in rearranged and/or 
expanded genomes move to more or less vulnerable positions 
along the gradient remain unexplored in the majority of 
vertebrates. 

Substitution rates in mitochondrial genes are also shaped 
by functional constraints on mitochondrial proteins, which 
differ across both genes and lineages; these differences are 
associated with variation in organismal traits (Martin and 
Palumbi 1993; Rand 1994; Galtier et al. 2009; Sun et al. 
2011). For example, high metabolic energy demands in 
birds and mammals are positively correlated with strong se- 
lective constraint (i.e., low levels of nonsynonymous relative to 
synonymous substitutions) on mitochondrial genes, whereas 
low metabolic demand in salamanders is correlated with 
weaker selective constraint (i.e., higher levels of nonsynon- 
ymous substitutions) (Shen et al. 2009; Chong and Mueller 
2013). It remains unclear whether organisms experiencing 
weaker selection on mitochondrial gene sequences, which 
underlie OXPHOS protein function, also experience weaker 
selection on genome architecture (i.e., genome size and 
order), which impacts OXPHOS protein transcription and 
translation. 

Salamanders, an amphibian clade of 648 species, include 
both normal mitochondrial genomes that have the verte- 
brate consensus mitochondrial gene order and a range of in- 
dependently derived, modified genomes that have both gene 
order rearrangements and genome expansions. Because 
this genomic diversity is unusual among vertebrates, salaman- 
ders provide a rare opportunity to test the effects of genomic 
modification on vertebrate mitochondrial gene sequence 
evolution. In this study, we show that: 1) mitochondrial pro- 
tein-coding genes within modified salamander genomes have 
significantly higher synonymous and nonsynonymous substi- 
tution rates than genes within normal salamander genomes; 
2) despite expansions of up to 6 kb, most genes in modified 
salamander genomes maintain their position along the muta- 
tion gradient; the genes that do move are not substantially 
impacted by their new position; and the mutation gradient in 
salamanders is weak; and 3) gene rearrangements and geno- 
mic expansion events occur independent of levels of selective 
constraint acting on mitochondrial genes. Taken together, our 
results demonstrate that large-scale changes to genome ar- 
chitecture impact mitochondrial gene sequence evolution in 
predictable ways within salamanders; however, despite these 
impacts, the same functional constraints are acting on mito- 
chondrial protein-coding genes in both modified and normal 
salamander genomes. 



Materials and Methods 

Sequence Data and Genome Characteristics 

We obtained 62 complete salamander mitochondrial genome 
sequences, each representing a unique species, from the 
GenBank RefSeq database (release 53). Our data set includes 
representatives from 6 of the 10 salamander families, with 
predominant taxonomic sampling representing the largest 
family, Plethodontidae. We extracted all 13 protein-coding 
genes from each mitochondrial genome based on GenBank 
genome annotations. 

We characterized mitochondrial genomes based on gene 
order and genome size, where genomes with gene arrange- 
ments that deviated from the consensus vertebrate mitochon- 
drial gene order and genomes that were expanded (> 1 7.5 kb) 
were labeled as modified (in contrast to normal). Gene order 
was described relative to the CR (the initiation site for mtDNA 
replication and transcription), beginning with tRNA-Phe, using 
a purpose-built perl script (Minxiao et al. 201 1). We extracted 
mitochondrial genome size data from the GenBank files. 
These results are summarized in supplementary file S1, 
Supplementary Material online. 

Compositional analyses were carried out for all genomes in 
the data set using custom perl scripts (available by request 
from author). For each of the 13 protein-coding genes, 
we estimated base composition for the whole gene and for 
the third position of 4-fold degenerate codons (P 4F d): alanine- 
GCN, glycine-GGN, leucine-CTN, proline-CCN, arginine-CGN, 
serine-TCN, threonine-CAN, and valine-GTN. Base frequen- 
cies, as well as AT skew and GC skew, were also calculated 
for the whole genome. These results are summarized in sup- 
plementary file S1 , Supplementary Material online. We tested 
for differences in AT skew and GC skew values between 
normal and modified genomes using a two-way Mann- 
Whitney test implemented in R. 

Initial Phylogenetic Analysis 

Multiplesequence alignments were performed based on amino 
acid sequences for each mitochondrial protein-coding gene 
using MUSCLE v.3.8 (Edgar 2004). We then estimated a max- 
imum likelihood tree for the concatenated mitochondrial gene 
data set using RAxML v.7.2 (Stamatakis 2006). The data were 
partitioned by gene and codon position and analyzed using the 
GTR + r model of nucleotide substitution for each partition. 

Nonsynonsymous and Synonymous Substitution Analysis 

For each mitochondrial gene, we estimated nonsynonymous 
(d/V) and synonymous (dS) substitution rates. We first 
estimated d/V and dS using a single value for each parameter 
across all branches (Model 0 in Codeml, implemented 
in PAML v4.4 [Yang 2007]) using the fixed topology of the 
maximum likelihood tree estimated from the concatenated 
mitochondrial sequences. Subsequently, we reestimated 
these parameters under Model 1 (in Codeml), which allows 
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these parameter values to vary for each branch. These results 
are summarized in supplementary file S1, Supplementary 
Material online. For each gene, we tested for differences in 
d/V and dS values between normal and modified genomes 
using a two-way Mann-Whitney test implemented in R. 

Mutation Gradient Analysis 

The site-specific mutation rate around the mitochondrial 
genome is associated with the duration of time spent single 
stranded, which is determined by the position of a site relative 
to the origins of replication. The duration of single-stranded 
state of the parental H strand (D ssH ) is defined by the duration 
between the displacement of the heavy strand by the replica- 
tion fork and synthesis of its complement; for a given gene, 
it is estimated using the formula D ssH = (L-2 [x-OJ)/L for 
the two genes located upstream of the origins of replication, 
ND1 and ND2, and D ssH = (2 [x-Oj)/L for the remaining 
genes, where L is the total length of the genome, 0 L is the 
position of the light strand origin of replication, and x is the 
midpoint of a gene (Tanaka and Ozawa 1994; Reyes et al. 
1 998). We used a modification of these formulas to estimate 
absolute mutation position, which is not standardized by 
genome size, for each mitochondrial gene for all genomes 
because our goals include comparing estimates of mutation 
position for a given gene across genomes of different sizes. 
Specifically, we hypothesize that rearrangements and expan- 
sions may alter the mutation position of genes in modified 
genomes. Mutation position is estimated using (L — 2 [x— OJ) 
for the two genes located upstream of the origins of replica- 
tion, ND1 and ND2, and (2 [x- OJ) for the remaining genes. 

Functional Constraint on Mitochondrial Genes 

To test whether mitochondrial genome expansions and rear- 
rangements reflect an overall relaxation of selective constraint 
on mitochondrial function, we estimated co, which is the ratio 
of nonsynonymous to synonymous substitutions (oj = d/V/dS), 
for each mitochondrial gene. The ratio co is used to measure the 
strength of selection, where for values of co between 0 and 1 , a 
smaller co indicates stronger purifying selection and a larger co 
indicates weaker purifying selection. We first estimated co using 
a single value across all branches (Model 0 in Codeml, imple- 
mented in PAML v4.4 [Yang 2007]) using the fixed topology 
of the concatenated mitochondrial maximum likelihood tree. 
We then reestimated co under Model 1 where parameter 
values can vary for all branches. These results are summarized 
in supplementary file S1, Supplementary Material online. For 
each gene, we tested whether co was greater in modified ge- 
nomes using a one-way Mann-Whitney test implemented in R. 

Results and Discussion 

Mitochondrial Genome Characteristics 

For mitochondrial genomes from 62 salamander species, we 
estimated genome composition and organization to identify 



and characterize the differences between normal and modi- 
fied genomes. We identified 14 mitochondrial genomes as 
modified based on evidence of gene rearrangement, a large 
increase in genome size (> 1 7.5 kb total length), or a combi- 
nation of the two. Eight of these modified genomes show 
extensive gene rearrangements that include one or more of 
the origins of replication and the regions flanking them, 
including both protein-coding genes and tRNAs. The remain- 
ing six modified genomes maintain the ancestral vertebrate 
gene order, but are larger than 1 9 kb in size. This size increase 
reflects the accumulation of tandem repeats of noncoding 
sequence in the CR and/or in the IGS, an intergenic spacer 
region between tRNA-Thr and tRNA-Pro present in diverse 
salamander clades (Wallis 1987; McKnight and Shaffer 
1 997; Mueller and Boore 2005). Similar patterns of noncoding 
sequence accumulation exist in other taxonomic groups (e.g., 
mammals, caecilians, fish, and invertebrates) (Stewart and 
Baker 1994; Prager et al. 1996; Delarbre et al. 2001; San 
Mauro et al. 2006; Minxiao et al. 2011). Tandem repetitive 
noncoding sequences are also present in seven of the eight 
genomes with rearranged gene order. Overall, genome size is 
significantly larger in modified salamander genomes 
(1 9,775 ± 1,809 bp) than in normal salamander genomes 
(16,475 ±209 bp). Both pseudogenes and additional copies 
of duplicate genes are present in at least six of the modified 
salamander mitochondrial genomes, suggesting that these 
rearrangements were mediated by duplication of a portion 
of the genome (Mueller and Boore 2005). The localization 
of genomic modification to regions containing and flanking 
the two origins of replication, as well as the presence of 
tandem repetitive sequences, suggests that these genomic 
regions are particularly susceptible to slipped-strand mispair- 
ing, intramolecular recombination, and imprecise replication. 

We estimated several measures of genome composition for 
normal and modified genomes and compared the two 
groups. Average base composition did not differ between 
normal (A: 0.3352, T: 0.3104, C: 0.1390, G: 0.2152) and 
modified genomes (A: 0.3414, T: 0.3029, C: 0.1307, G: 
0.2250). Average GC content also did not differ between 
the two groups (35.4% and 35.5%, respectively). In contrast, 
normal and modified genomes did differ slightly in both aver- 
age AT skew (0.039 and 0.046, respectively; P= 0.087) and 
GC skew (-0.214 and -0.226, respectively; P< 0.001). 

Substitution Rates in Normal and Modified Genomes 

To test for differences in rates of evolution between normal 
and modified genomes, we estimated rates of synonymous 
(dS) and nonsynonymous (d/V) substitution across a maximum 
likelihood tree for each of the 13 mitochondrial protein- 
coding genes. Our results show that mitochondrial genes in 
modified genomes have a significantly elevated dS (P < 0.001 
for all genes) and d/V (P< 0.006 for all genes). Modified ge- 
nomes show, on average, a 2.66-fold increase in dS compared 
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Fig. 2. — Comparison of nonsynonymous and synonymous substitution rates for the 13 mitochondrial protein-coding genes. Normal genomes 
are on the left; modified genomes are on the right. The dashed lines separate mitochondrial genes that belong to different OXPHOS function complexes. 
For all comparisons, substitution rates of modified genomes are significantly higher than substitution rates of normal genomes. 



with normal genomes; COX1 shows the smallest increase 
(2.31 -fold), whereas ATP6 shows the greatest increase 
(3.17-fold). Modified genomes show, on average, a 3.07- 
fold increase in dN compared with normal genomes; ND4 
shows the smallest increase (2.42-fold) and COX2 shows 
the largest increase (4.50-fold). These results demonstrate 
that mitochondrial genome modification is correlated with 
an absolute increase in both synonymous and nonsynon- 
ymous substitution rates for mitochondrial protein-coding 
genes, independent of phytogeny (fig. 2). 

Although the mechanisms that give rise to the correlation 
between substitution rate and mitochondrial rearrangement 
remain unknown in salamanders as well as other taxa, 
a number of hypotheses have been proposed. Some such 
hypotheses state that an increase in substitution rates causes 
elevated rates of gene rearrangement. Specifically, increased 
substitutions accumulating in the sequences regulating repli- 
cation initiation and termination can decrease replication 
fidelity, resulting in duplications, deletions, and rearrange- 
ments (Shao et al. 2003). The high rates of substitution may 
also cause high rates of DNA damage and double-strand 
breaks, which can facilitate intramolecular recombination 
(Dowton and Campbell 2001). Alternatively, the presence of 
gene rearrangements may cause an increase in substitution 
rates by an undetermined mechanism. Differences in popula- 
tion biology have also been proposed to explain the correla- 
tion between elevated substitution rates and genomic 



modifications; relatively strong genetic drift would cause 
fixation of both point mutations and large-scale genomic 
modifications (Lynch et al. 2006; Boussau et al. 2011). 
Comparative studies that test for the signatures of these pu- 
tative molecular and demographic mechanisms are required 
to determine the cause of this correlation in salamanders as 
well as other taxa. 

Mutation Gradient in Normal and Modified Genomes 

To test for differences in gene placement along the mutation 
gradient between normal and modified mitochondrial ge- 
nomes, we first estimated the average mutation position for 
each gene in each species. We define average mutation po- 
sition as the duration spent single stranded, measured in base 
pairs. This measurement reflects the probability of mutation 
for a given position in the mitochondrial genome due to its 
exposure, in the single-stranded state, to the mutagenic mi- 
tochondrial environment during replication. We do not stan- 
dardize for total genome size, as done in previous studies on 
mitochondrial mutational gradients (Tanaka and Ozawa 1 994; 
Faith and Pollock 2003; Krishnan et al. 2004), because our 
goals include comparing estimates of mutation position for a 
given gene across genomes of different sizes. Specifically, we 
hypothesize that rearrangements and expansions may alter 
the mutation position of genes in modified genomes. Our 
results show that alterations to mutation position are 
restricted to a subset of genes in modified genomes. 
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Specifically, genes located upstream of both the 0 L and 
the 0 H — ND1 and ND2 (fig. 1) — experience a large increase in 
average mutation position in modified genomes (3.3 ± 1 .9 kb 
each) (fig. 3). This pattern reflects the fact that the majority of 
genomic expansions occur near the 0 H . During replication, 
such expansions are not encountered until replication has 
begun from the 0 L (fig. 1) and completed the majority of 
both strands. Therefore, only genes located upstream of 
both the 0 L and the 0 H — ND1 and ND2— spend a longer 
duration in the single-stranded state in expanded genomes. 
Of the remaining 1 1 genes positioned downstream of both 
replication origins, 9 are never involved in gene rearrange- 
ments and have very similar mutation positions in both mod- 
ified and normal genomes; on average, these genes shift less 
than 70 bp (fig. 3). In contrast, ND6 and CYTB have been 
involved in four independent gene rearrangements (Mueller 
and Boore 2005). Mutation position of ND6 either decreases 
by 4.0 kb or increases by up to 8.5 kb, depending on lineage- 
specific gene rearrangements. Mutation position of CYTB 
either decreases by 1 .2 kb or increases by up to 2.5 kb, 
depending on lineage-specific gene rearrangements. 



We then tested for a gradient generated by mutational 
bias during replication in normal and modified salamander 
genomes. We estimated base compositional asymmetry (i.e., 
AT skew) of the third position of 4-fold degenerate codons 
(P 4FD ) for each gene. A linear regression of AT skew on 
mutation position indicates significant positive correlation 
for normal genomes, though only a small proportion of com- 
positional variation can be explained by mutation position 
(y=0.000006874x+ 0.271, ^ = 0.056, P< 0.001). The pos- 
itive relationship between AT skew and mutation position is 
not significant in modified genomes (y= 0.00000391 3x+ 
0.318, a 2 = 0.018, P= 0.068), suggesting that modified ge- 
nomes are not at the same base compositional equilibrium 
as normal salamander genomes. However, this mutational 
gradient does not translate into a gradient in either d/V or dS 
(fig. 3). Thus, although the mutation positions of ND1, ND2, 
ND6, and CYTB are altered during genomic expansion and/ 
or rearrangement, their movement to this new position 
has no significant impact on their substitution rates. 
Salamanders have the lowest aerobic metabolic demands 
of any tetrapod vertebrates, suggesting that their levels of 
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ROS (a mutagenic byproduct of oxidative phosphorylation) 
may be low. This may underlie their weak mitochondrial 
mutational gradient, although comparative studies of this 
gradient across taxa with different metabolic rates are re- 
quired to test this hypothesis. 

Functional Constraints on Mitochondrial Genes in Normal 
and Modified Genomes 

To test whether mitochondrial genome expansions and 
rearrangements reflect an overall relaxation of selective 
constraint on mitochondrial function, we compared esti- 
mates of co (d/V/dS, a measure of the strength of selec- 
tion) for all 13 mitochondrial genes between normal and 
modified genomes. Specifically, we tested whether co is 
greater in modified genomes, indicative of weaker selec- 
tive constraint. Although both d/V and d5 are elevated for 
all genes in modified genomes, co values for genes in 
modified genomes are not significantly greater than co 
values in normal genomes (fig. 4; P> 0.189 for all 
genes). Thus, genes within modified genomes are not 
experiencing weaker selective constraint than genes 
within normal genomes. This result demonstrates that mi- 
tochondrial genome expansions and rearrangements likely 
do not reflect an overall relaxation of selective constraint 
on mitochondrial function. Such an overall relaxation 



would affect protein sequence evolution as well as tran- 
scriptional and translational efficiency, which are impacted 
by changes in genome size and gene order (Fernandez- 
Silva et al. 2003; Bonawitz et al. 2006; Satoh et al. 2010; 
Chong and Mueller 2013). In contrast, our results show 
that functional constraints on protein-coding sequences are 
not weaker in lineages with modified genomes than in 
lineages with normal genomes. 

Supplementary Material 

Supplementary file S1 is available at Genome Biology and 
Evolution online (http://www.gbe.oxfordjournals.org/). 
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